88 research outputs found

    Sparse Learning over Infinite Subgraph Features

    Full text link
    We present a supervised-learning algorithm from graph data (a set of graphs) for arbitrary twice-differentiable loss functions and sparse linear models over all possible subgraph features. To date, it has been shown that under all possible subgraph features, several types of sparse learning, such as Adaboost, LPBoost, LARS/LASSO, and sparse PLS regression, can be performed. Particularly emphasis is placed on simultaneous learning of relevant features from an infinite set of candidates. We first generalize techniques used in all these preceding studies to derive an unifying bounding technique for arbitrary separable functions. We then carefully use this bounding to make block coordinate gradient descent feasible over infinite subgraph features, resulting in a fast converging algorithm that can solve a wider class of sparse learning problems over graph data. We also empirically study the differences from the existing approaches in convergence property, selected subgraph features, and search-space sizes. We further discuss several unnoticed issues in sparse learning over all possible subgraph features.Comment: 42 pages, 24 figures, 4 table

    Mining significant substructure pairs for interpreting polypharmacology in drug-target network.

    Get PDF
    A current key feature in drug-target network is that drugs often bind to multiple targets, known as polypharmacology or drug promiscuity. Recent literature has indicated that relatively small fragments in both drugs and targets are crucial in forming polypharmacology. We hypothesize that principles behind polypharmacology are embedded in paired fragments in molecular graphs and amino acid sequences of drug-target interactions. We developed a fast, scalable algorithm for mining significantly co-occurring subgraph-subsequence pairs from drug-target interactions. A noteworthy feature of our approach is to capture significant paired patterns of subgraph-subsequence, while patterns of either drugs or targets only have been considered in the literature so far. Significant substructure pairs allow the grouping of drug-target interactions into clusters, covering approximately 75% of interactions containing approved drugs. These clusters were highly exclusive to each other, being statistically significant and logically implying that each cluster corresponds to a distinguished type of polypharmacology. These exclusive clusters cannot be easily obtained by using either drug or target information only but are naturally found by highlighting significant substructure pairs in drug-target interactions. These results confirm the effectiveness of our method for interpreting polypharmacology in drug-target network

    Machine learning refinement of in situ images acquired by low electron dose LC-TEM

    Full text link
    We study a machine learning (ML) technique for refining images acquired during in situ observation using liquid-cell transmission electron microscopy (LC-TEM). Our model is constructed using a U-Net architecture and a ResNet encoder. For training our ML model, we prepared an original image dataset that contained pairs of images of samples acquired with and without a solution present. The former images were used as noisy images and the latter images were used as corresponding ground truth images. The number of pairs of image sets was 1,2041,204 and the image sets included images acquired at several different magnifications and electron doses. The trained model converted a noisy image into a clear image. The time necessary for the conversion was on the order of 10ms, and we applied the model to in situ observations using the software Gatan DigitalMicrograph (DM). Even if a nanoparticle was not visible in a view window in the DM software because of the low electron dose, it was visible in a successive refined image generated by our ML model.Comment: 33 pages, 9 figure

    Machine learning reveals orbital interaction in crystalline materials

    Get PDF
    We propose a novel representation of crystalline materials named orbital-field matrix (OFM) based on the distribution of valence shell electrons. We demonstrate that this new representation can be highly useful in mining material data. Our experiment shows that the formation energies of crystalline materials, the atomization energies of molecular materials, and the local magnetic moments of the constituent atoms in transition metal--rare-earth metal bimetal alloys can be predicted with high accuracy using the OFM. Knowledge regarding the role of coordination numbers of transition-metal and rare-earth metal elements in determining the local magnetic moment of transition metal sites can be acquired directly from decision tree regression analyses using the OFM.Comment: 10 page

    SiBIC: A Tool for Generating a Network of Biclusters Captured by Maximal Frequent Itemset Mining

    Get PDF
    Biclustering extracts coexpressed genes under certain experimental conditions, providing more precise insight into the genetic behaviors than one-dimensional clustering. For understanding the biological features of genes in a single bicluster, visualizations such as heatmaps or parallel coordinate plots and tools for enrichment analysis are widely used. However, simultaneously handling many biclusters still remains a challenge. Thus, we developed a web service named SiBIC, which, using maximal frequent itemset mining, exhaustively discovers significant biclusters, which turn into networks of overlapping biclusters, where nodes are gene sets and edges show their overlaps in the detected biclusters. SiBIC provides a graphical user interface for manipulating a gene set network, where users can find target gene sets based on the enriched network. This chapter provides a user guide/instruction of SiBIC with background of having developed this software. SiBIC is available at http://utrecht.kuicr.kyoto-u.ac.jp:8080/sibic/faces/index.jsp

    Efficiently finding genome-wide three-way gene interactions from transcript- and genotype-data

    Get PDF
    Motivation: We address the issue of finding a three-way gene interaction, i.e. two interacting genes in expression under the genotypes of another gene, given a dataset in which expressions and genotypes are measured at once for each individual. This issue can be a general, switching mechanism in expression of two genes, being controlled by categories of another gene, and finding this type of interaction can be a key to elucidating complex biological systems. The most suitable method for this issue is likelihood ratio test using logistic regressions, which we call interaction test, but a serious problem of this test is computational intractability at a genome-wide level

    Mining metabolic pathways through gene expression

    Get PDF
    Motivation: An observed metabolic response is the result of the coordinated activation and interaction between multiple genetic pathways. However, the complex structure of metabolism has meant that a compete understanding of which pathways are required to produce an observed metabolic response is not fully understood. In this article, we propose an approach that can identify the genetic pathways which dictate the response of metabolic network to specific experimental conditions
    corecore